Receiver-Operating-Characteristic Analysis Reveals Superiority of Scale-Dependent 
Wavelet and Spectral Measures for Assessing Cardiac Dysfunction 
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Receiver-operating-characteristic (ROC) analysis was used to assess the suitability of various heart 
rate variability (HRV) measures for correctly classifying electrocardiogram records of varying lengths 
as normal or revealing the presence of heart failure. Scale- dependent HRV measures were found to 
be substantially superior to scale-independent measures (scaling exponents) for discriminating the 
two classes of data over a broad range of record lengths. The wavelet-coefficient standard deviation 
at a scale near 32 heartbeat intervals, and its spectral counterpart near 1/32 cycles per interval, 
provide reliable results using record lengths just minutes long. A jittered integrate-and-fire model 
built around a fractal Gaussian-noise kernel provides a realistic, though not perfect, simulation of 
heartbeat sequences. 

PACS number(s) 87.10.+e, 87.80.+S, 87.90.+y 

Though the notion of using heart rate variability (HRV) analysis to assess the condition of the cardiovascular system 
stretches back some 40 years, its use as a noninvasive clinical tool has only recently come to the fore Q. A whole 
host of measures, both scale-dependent and scale-independent, have been added to the HRV armamentarium over the 
years. 

One of the more venerable among the many scale-dependent measures in the literature is the interbeat-interval 
(R-R) standard deviation dint @ ■ The canonical example of a scale-independent measure is the scaling exponent as 
of the interbeat-interval power spectrum, associated with the decreasing power-law form of the spectrum at sufficiently 
low frequencies /: S(f) oc f~ as Other scale-independent measures have been developed by us and by 

others . One of the principal goals of this Letter is to establish the relative merits of these two classes of measures, 
scale-dependent and scale-independent, for assessing cardiac dysfunction. 

One factor that can confound the reliability of a measure is the nonstationarity of the R-R time series. Multiresolu- 
tion wavelet analysis provides an ideal means of decomposing a signal into its components at different scales [[To] [l2| , 
and at the same time has the salutary effect of eliminating nonstationarities Jl3|,[l4j . It is therefore ideal for examining 
both scale-dependent and scale-independent measures; it is in this latter capacity that it provides an estimate of the 
wavelet scaling exponent aw H- 

We recently carried out a study || in which wavelets were used to analyze the R-R interval sequence from a standard 
electrocardiogram (ECG) database Using the wavelet-coefficient standard deviation <7 wav (m), where m — 2 r is 
the scale and r is the scale index, we discovered a critical scale window near m = 32 interbeat intervals over which it 
was possible to perfectly discriminate heart-failure patients from normal subjects. The presence of this scale window 
was confirmed in an Israeli-Danish study of diabetic patients who had not yet developed clinical signs of cardiovascular 
disease These two studies |6|Jl?|, in conjunction with our earlier investigations which revealed a similar critical 
scale window in the counting statistics of the heartbeat j|,[||l7]] (as opposed to the time-interval statistics considered 
here), lead to the recognition that scales in the vicinity of m = 32 enjoy a special status. This conclusion has been 
borne out for a broad range of analyzing wavelets, from Daubechies 2-tap (Haar) to Daubechies 20-tap (higher- 
order analyzing wavelets are suitable for removing polynomial nonstationarities H]). It is clear that scale-dependent 
measures [such as <7 wav (32)] substantially outperform scale-independent ones (such as as and aw) in their ability to 
discriminate patients with certain cardiac dysfunctions from normal subjects (see also fi~8| , |l9|| ) . 

The reduction in the value of the wavelet-coefficient standard deviation <7 wav (32) that leads to the scale window 



occurs not only for heart-failure patients 



but also for heart-failure patients with atrial fibrillation UM, diabetic 



patients jl6) , heart-transplant patients |T(| 19| , and in records preceeding sudden cardiac death d|ll| . The depression 
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of cr wav (32) at these scales is likely associated with the impairment of autonomic nervous system function. Barorefiex 
modulations of the sympathetic or parasympathetic tone typically lie in the range 0.04-0.09 Hz (11-25 sec), which 
corresponds to the time range where a wav (m) is reduced. 

The perfect separation achieved in our initial study of 20-h Holter- monitor recordings endorses the choice of <7 wav (32) 
as a useful diagnostic measure. The results of most studies are seldom so clear-cut, however. When there is incomplete 
separation between two classes of subjects, as observed for other less discriminating measures using these identical 
long data sets J?],||], or when our measure is applied to large collections of out-of-sample or reduced-length data sets 
p9[ , an objective means for determining the relative diagnostic abilities of different measures is required. 

ROC Analysis - Receiver-operating-characteristic (ROC) analysis is an objective and highly effective technique 
for assessing the performance of a measure when it is used in binary hypothesis testing. This format provides that 
a data sample be assigned to one of two hypotheses or classes (e.g., normal or pathologic) depending on the value 
of some measured statistic relative to a threshold value. The efficacy of a measure is then judged on the basis of 
its sensitivity (the proportion of pathologic patients correctly identified) and its specificity (the proportion of control 
subjects correctly identified). The ROC curve is a graphical presentation of sensitivity versus 1— specificity as a 
threshold parameter is swept (see Fig. 1). 

The area under the ROC curve serves as a well-established index of diagnostic accuracy |^p]| ; a value of 0.5 arises 
from assignment to a class by pure chance whereas the maximum value of 1.0 corresponds to perfect assignment (unity 
sensitivity for all values of specificity). ROC analysis can be used to choose the best of a host of different candidate 
diagnostic measures by comparing their ROC areas, or to establish for a single measure the tradeoff between reduced 
data length and misidentifications (misses and false positives) by examining ROC area as a function of record length 
(see Fig. 2). A minimum record length can then be specified to achieve acceptable classification. Because ROC 
analysis relies on no implicit assumptions about the statistical nature of the data set |18 23], it is more reliable and 
appropriate for analyzing non-Gaussian time series than are measures of statistical significance such as p- value and 
d! which are expressly designed for signals with Gaussian statistics |^8|. Moreover, ROC curves are insensitive to the 
units employed (e.g., spectral magnitude, magnitude squared, or log magnitude); ROC curves for a measure M are 
identical to those for any monotonic transformation thereof such as M x or log(M). In contrast the values of d', and 
its closely related cousins, change under such transformations. Unfortunately, this is not always recognized which 
leads some authors to specious conclusions pjj . 

Scale- Dependent vs Scale- Independent Measures.- Wavelet analysis provides a ready comparison for scale-dependent 
and scale-independent measures since it reveals both. ROC curves constructed using 75,821 R-R intervals from each 
of the 24 data sets (12 heart failure, 12 normal) p|, are presented in Fig. 1 (left) for the wavelet measure <7 wav (32) 
(using the Haar wavelet) as well as for the wavelet measure aw- It is clear from Fig. 1 that the area under the 
tj wav (32) ROC curve is unity, indicating perfect discriminability. This scale-dependent measure clearly outperforms 
the scale-independent measure aw which has significantly smaller area. These results are found to be essentially 
independent of the analyzing wavelet M . 

We now use ROC analysis to quantitatively compare the tradeoff between reduced record length and misidentifica- 
tions for this standard set of heart-failure patients using three scale-dependent and three scale-independent measures. 
In the first category are the wavelet-coefficient standard deviation cr wav (32), its spectral counterpart 5(1/32) M,E2[, 
and the interbeat-interval standard deviation cTi n t . In the second category, we consider the wavelet scaling exponent 
aw i the spectral scaling exponent as, and a scaling exponent an calculated according to detrended fluctuation 
analysis (DFA) |§. 

In Fig. 2 (left) we present ROC area, as a function of R-R interval record length, using these six measures. The area 
under the ROC curves forms the rightmost point in the ROC area curves. The file sizes are then divided into smaller 
segments of length L. The area under the ROC curve is computed for the first such segment for all 6 measures, and 
then for the second segment, and so on for all segments of length L. From the L max /L values of the ROC area, the 
mean and standard deviation are computed. The lengths L employed range from L = 2 6 = 64 to L = 2 16 = 65, 536 
in powers of two. 

The best performance is achieved by cr wav (32) and 5(1/32), both of which attain unity area (perfect separation) 
for sufficiently long R-R sequences. Even for fewer than 100 heartbeat intervals, corresponding to just a few minutes 
of data, these measures provide excellent results (in spite of the fact that both diurnal and nocturnal records are 
included). o~i n t does not perform quite as well. The worst performance, however, is provided by the three scaling 
exponents aw, as, and an, confirming our previous findings ^[pj. Moreover, results obtained from the different 
power-law estimators differ widely p3[ ], suggesting that there is little merit in the concept of a single exponent, no less 
a "universal" one pi] , for characterizing the human heartbeat sequence. In a recent paper Amaral et al. [ElJ conclude 
exactly the opposite, that the scaling exponents provide the best performance. This is because they improperly make 
use of the Gaussian-based measures d 2 and r\, which are closely related to d' , rather than ROC analysis. These same 



2 



authors ]2l| also purport to glean information from higher moments of the wavelet coefficients, but such information 
is not reliable because estimator variance increases with moment order. The results presented here accord with those 
obtained in a detailed study of 16 different measures of HRV |l8[| . There are vast differences in the time required 
to compute these measures however: for 75,82f interbeat intervals, <7 wav (32) requires the shortest time (20 msec) 
whereas DFA(32) requires the longest time (650,090 msec). 

It will be highly useful to evaluate the relative performance of these measures for other records, both normal and 
pathologic. In particular the correlation of ROC area with severity of cardiac dysfunction should be examined. 
An issue of importance is whether the R-R sequences, and therefore the ROC curves, arise from deterministic chaos 
. We have carried out a phase-space analysis in which differences between adjacent R-R intervals are embedded. 
This minimizes correlation in the time series which can interfere with the detection of deterministic dynamics. The 
results indicate that the behavior of the underlying R-R sequences, both normal and pathological, appear to have 
stochastic rather than deterministic origins |l8| . 

Generating a realistic heartbeat sequence - The generation of a mathematical point process that faithfully emulates 
the human heartbeat could be of importance in a number of venues, including pacemaker excitation. Integrate-and- 
fire (IF) models, which are physiologically plausible, have been developed for use in cardiology. Berger et al. [fj4| , 
for example, constructed an integrate-and-fire model in which an underlying rate function was integrated until it 
reached a fixed threshold, whereupon a point event was triggered and the integrator reset. Improved agreement with 
experiment was obtained by modeling the stochastic component of the rate function as band-limited fractal Gaussian 
noise (FGN), which introduces scaling behavior into the heart rate, and setting the threshold equal to unity ||. This 
fractal-Gaussian-noise integrate-and-fire (FGNIF) model has been quite successful in fitting a whole host of interval- 
and count-based measures of the heartbeat sequence for both heart-failure patients and normal subjects ||. However, 
it is not able to accommodate the differences observed in the behavior of <J wav (m) for the two classes of data. 

To remedy this defect, we have constructed a jittered version of this model which we dub the fractal-Gaussian-noise 
jittered integrate-and-fire (FGNJIF) model (^3|. The occurrence time of each point of the FGNIF is jittered by a 
Gaussian distribution of standard deviation J. Increasing the jitter parameter imparts additional randomness to the 
R-R time series at small scales, thereby increasing cr wav at small values of m and, concomitantly, the power spectral 
density at large values of the frequency /. The FGNJIF simulation does a rather good job of mimicing patient 
and control data for a number of key measures used in heart-rate- variability analysis. The model is least successful 
in fitting the interbeat-interval histogram p T (r), particularly for heart-failure patients. This indicates that that a 
mechanism other than jitter for increasing cr wav at low scales should be sought Q. 

It is of interest to examine the global performance of the FGNJIF model using the collection of 24 data sets. To 
achieve this we carried out FGNJIF simulations using parameters comparable with the actual data and constructed 
simulated ROC curves for the measures cr wav (32) and aw as shown in Fig. 1 (right). Similar simulations for ROC area 
versus record length are displayed in Fig. 2 (right) for the six experimental measures considered. Overall, the global 
simulations (right-hand side of Fig. I and 2) follow the trends of the data (left-hand side of Fig. 1 and 2) reasonably 
well, with the exception of a lnt . This failure is linked to the inability of the simulated results to emulate the observed 
interbeat-interval histograms. It will be of interest to consider modifications of the FGNIF model that might bring 
the simulated ROC curves into better accord with the data-based curves. 
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FIG. 1. ROC curves (sensitivity vs 1— specificity) for two wavelet-based measures: cr wa v(32) which is scale-dependent and 
aw which is scale-independent. Left: ROC curves obtained using all 24 data records, each comprising 75,821 interbeat intervals 
Jl5| . The scale-dependent measure outperforms the scale-independent one since its ROC area is greater. Right: Comparable 
result obtained using simulations for the fractal-Gaussian-noise jittered integrate-and-fire (FGNJIF) model. 



FIG. 2. Diagnostic accuracy (area under ROC curve) vs data length (number of R-R intervals) for three scale-dependent and 
three scale-independent measures (mean ± one standard deviation) . An area of unity corresponds to the correct assignment of 
each patient to the appropriate class. Left: <r wav (32) and S'(l/32) provide excellent performance, attaining unity area (perfect 
separation) for 32,768 (or more) R-R intervals. These measures continue to perform well even as the number of R-R intervals 
decreases below 100, corresponding to record lengths just minutes long. The performance of <Ti n t is seen to be slightly inferior. 
In contrast, all three scale-independent measures perform poorly. Right: Similar results are obtained using 24 simulations of 
the FGNJIF model, with the exception of <7i nt (see text). 
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